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ASYMPTOTIC EQUIVALENCE FOR DENSITY ESTIMATION AND 
GAUSSIAN WHITE NOISE: AN EXTENSION 

ESTER, MARIUCCI 


Abstract. The aim of this paper is to present an extension of the well-known as¬ 
ymptotic equivalence between density estimation experiments and a Gaussian white 
noise model. Our extension consists in enlarging the nonparametric class of the 
admissible densities. More precisely, we propose a way to allow densities defined 
on any subinterval of R, and also some discontinuous or unbounded densities are 
considered (so long as the discontinuity and unboundedness patterns are somehow 
known a priori). The concept of equivalence that we shall adopt is in the sense of 
the Le Cam distance between statistical models. The results are constructive: all 
the asymptotic equivalences are established by constructing explicit Markov kernels. 


1. Introduction 

When looking for asymptotic results for some statistical model it is often useful to 
profit from a global asymptotic equivalence, in the Le Cam sense, in order to be allowed 
to work in a simpler but equivalent model. Indeed, proving an asymptotic equivalence 
result means that one can transfer asymptotic risk bounds for any inference problem 
from one model to the other, at least for bounded loss functions. Roughly speaking, 
saying that two models, and , are equivalent means that they contain the same 
amount of information about the parameter that we are interested in. For the basic 
concepts and a detailed description of the notion of asymptotic equivalence, we refer to 
jh, z| • A short review of this topic will be given in Appendix. 

In recent years, numerous papers have been published on the subject of nonparametric 
asymptotic equivalence. For a non exhaustive list of the main ones among them, see, for 
example, the introduction in (§j|. In this paper, we will focus on nonparametric density 
estimation experiments. 

The seminal paper in this subject is due to Nussbaum [gj. There, the asymptotic 
equivalence between an experiment given by n observations of a density / on [ 0 , 1 ] and 
a Gaussian white noise model: 

dyt = yfUfydt + ~^=dW t , t e [0,1], 

2 y'n 

was established. Over the years several generalizations of this result have been proposed 
such as Eli!- in El, Brown et al. obtained the global asymptotic equivalence between 
a Poisson process with variable intensity and a Gaussian white noise experiment with 
drift problem. Via Poissonization, this result was also extended to density estimation 
models. In [5] Jahnisch and Nussbaum proved the global asymptotic equivalence be¬ 
tween a nonparametric model associated with the observation of independent but not 
identically distributed random variables on the unit interval and a bivariate Gaussian 
white noise model. More closely related to our work is the result of Carter in [2]. In 
that paper, he proposed a new approach to establish the same normal approximations 
to density estimations experiments as in [g]. While the result in jg| is obtained by means 
of Poissonization, in [2] the key step is to connect the density estimation problem to a 
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multinomial experiment and to simplify the latter with a multivariate normal experi¬ 
ment. 

The purpose of the present work is to generalize Q and [ 2 ]. More precisely, the 
density estimation experiments that we consider consist of n independent observations 
(Yi )” =1 defined on a interval I CM from some unknown distribution P? having density 

dp 9 

(with respect to the Lebesgue measure on I) = f(x)g(x). In particular, we do 

not require J C 1 to be bounded as is generally done in the existing literature. The 
function g is supposed to be known whereas / is unknown and belongs to a certain 
nonparametric functional class Formally, the statistical model we consider is 

(1) 0> a n =(M n ,@{M n ),{P 9 f : f C&}). 

The exact assumptions on / and g will be specified in Section [2] Here, let us only stress 
the fact that / has to be bounded away from zero and infinity and sufficiently regular, 
whereas g can be both unbounded and discontinuous. The advantage with respect to the 
earlier works is that this framework allows us to treat densities of the form h = fg not 
necessarily bounded nor smooth. See Section [Til for a discussion about the hypotheses. 

Finally, let us introduce the Gaussian white noise model. For that, let us denote 
by (C, “«?) the space of continuous mappings from I into K. endowed with its standard 
filtration and by W 9 the law induced on (C, “if) by a stochastic process satisfying: 

(2) dY t = V f(t)g{t)dt + ^p=, t G /, 

2 \Jn 

where (Wt)t £ R is a Brownian motion on R conditional on Wq = 0. Then we set 

(3) #; 9 = (C,^,{W 9 :/G^}). 

Let A be the Le Cam pseudo-distance between statistical models having the same pa¬ 
rameter space. For the convenience of the reader a formal definition is given in Section 
IA.1I Our main result is then as follows (see Theorem 13.II for the precise statement): 

Main result 1.1. Let I be a possibly infinite subinterval of M and let & consist of 
functions bounded away from 0 and 00 , satisfying the regularity assumptions stated in 
Section Then, we have 

(4) lim A(^ 9 ,1T„ 9 ) = 0. 

n—> 00 

In some special cases an explicit upper bound for the rate of convergence in fl]) is 
available; see, e.g. Corollary 13.21 The structure of the proof follows Carter’s in Q, 
but we detach from it on several aspects. The basic idea is to use his multinomial- 
multivariate normal approximation, but some technical points have to be taken into 
account. One of these is that I may be infinite, so that, in particular, the subintervals 
,7, in which it is partitioned cannot be of equal length. We choose intervals Ji of varying 
length, according to the quantiles of t'o, the measure having density g with respect to 
Lebesgue. This kind of partitions was already considered in [s||. 

The paper is organized as follows. Section [2] fixes the assumptions on the param¬ 
eter space J£\ Section [3] contains the statement of the main results and a discussion 
while Section 0] is devoted to the proofs. The paper includes an Appendix recalling the 
definition and some useful properties of the Le Cam distance. 

2. The parameter space 

Fix a finite measure vq on a possibly infinite interval I C 1, admitting a density 
g with respect to Lebesgue. The class of functions & will be considered as a class 
of probability densities with respect to v 0 , i.e. J r f{x)g{x)dx = 1. For each / G 
let v (resp. h m ) be the measure having / (resp. f m ) as a density with respect to zz 0 
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where, for every / £ L?, / m (x) is defined as follows. Given a positive integer to, let 
Ji = I fl (—oo, iq], Jj := (vj,Vj+ 1 ] for j = 1 ,..., to — 1 and J m = Jfl (u m , oo) where the 
Vj’s are the quantiles for v o, i.e. 


(5) 


/ j \ M 1 ) w . 

Vn '■= Vo{Jj) = -, Vj = l,...,TO. 


Define x* := 


/,. xvo{dx) 


and 


( 6 ) 


fm{x) := 


'W 


x i+i~ x i 
v(Jm) 

Mn 


y ('t.7' + l) 


^ - X j) + ~ X ) 


if x £ / fl (—oo, a:*], 

if x £ (x*,x* +1 ] j = l,...,i 

if x £ / fl (x^, oo). 


We now explain the assumptions we will need to make on the parameter /. We 
require that: 

(HI) There exist constants k, M > 0 such that k < f(y) < M, for all y £ I and / £ &. 

The to introduced above will be considered as a function of n, m = m n . We can thus 
consider \fj m , the linear interpolation of y/f constructed as f m above and introduce 
the quantities: 


H m(f) : = J (v/M - V fm(x)J Mdx), 

A h(f) ■= f (Vfmto ~ v / 7(y)) Mdy), 

is 0 {dy) - ■ 


:= 

i=l 


'Jo VMJj) 

We will assume the existence of a sequence of discretizations m = m n such that: 
(Cl) lim n sup + A 2 m {f) + = 0. 

n—y oo g; 


3. Main results and discussion 

Using the notation introduced in Section [21 we now state our main result in terms of 
the models and Wf) defined in m and ([3]), respectively. 

Theorem 3.1. Let uq be a finite measure on an (possibly infinite) interval / Cl having 
density g with respect to Lebesgue. Suppose that there exists a sequence m = m n such 
that every f £ & satisfies conditions (HI) and (Cl). Then, for n big enough we have: 

A(£™, K 9 ) = O (v^ sup (A m (f) + B m (f) + . 

Corollary 3.2. Let I be a compact subset o/M. For fixed 7 £ (0,1] and K , k, M strictly 
positive constants, consider the functional class 

&h,K, K ,M) = {/ e c\l) : £ < f(x) < M, I fix) - f'(y)\ < K\x - y\\ Vx,y £ /}. 

Suppose L? C Then 

A(^, Wfl) = o(v^(C +1 + Vhjm) + , 

where £ m = maxj = i ) ... )Tn \vj — Vj—i\ } with the u, ’s defined as in Section [H 
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3.1. Existing literature and discussion. As it has already been highlighted in the 
introduction, our result is a generalization of those in @ and Q. In order to discuss the 
link between our work and the previous ones, we recall the results contained in these 
papers. 

• Asymptotic equivalence of density estimation and Gaussian white noise, Q. 

In this paper Nussbaum establishes a global asymptotic equivalence between 
the problem of density estimation from an i.i.d. sample and a Gaussian white 
noise model. More precisely, let (Yi)™_ 1 be i.i.d. random variables with density / 
on [0,1] with respect to the Lebesgue measure. The densities / are the unknown 
parameters and they are supposed to belong to a certain nonparametric class 
& subject to a Holder restriction: \f(x) — f(y)\ < C\x — y\ a with a > \ and 
a positivity restriction: f(x) > e > 0. Let us denote by 3P\, n the statistical 
model associated with the observation of the Yf s. Furthermore, let ^ 2 ,n be the 
experiment in which one observes a stochastic process (Lt)te[o,ij such that 

dY t = yj f(t)dt + —^ =dW t , i£[0,1] 

where (Wt)te\o,i ] is a standard Brownian motion. Then the main result in Q is 
that A(^ 2 i >n , & 2 ,n) ~> 0 as n —> oo. 

This is done by first showing that the result holds for certain subsets JP n (fo) 
of the class JF described above. Then it is shown that one can estimate the 
/o rapidly enough to fit the various pieces together. Without entering into any 
detail, let us just mention that the key steps are a Poissonization technique and 
the use of a functional KMT inequality. 

• Deficiency distance between multinomial and multivariate normal experiments, 

I- 

In this paper Carter establishes a global asymptotic equivalence between a 
density estimation model and a Gaussian white noise model by bounding the Le 
Cam distance between multinomial and multivariate normal random variables. 
More precisely, let us denote by M.(n,9) the multinomial distribution, where 
6 := (0i,..., dm). Denote the covariance matrix nVg: Its (*, j)th element equals 
to ndi( 1 — 9i)Sij — nOidj. 

The main result is an upper bound for the Le Cam distance A (A4 , J\f) between 
the models M := {M(n,9) : 9 £ 0} and A f := {jY(n8, nVg) : 9 £ 0}, under 
some regularity assumptions on 0. In particular, Carter proves that 

, to In to . , , maxj 0 , 

A (M,Af) < C e - ■=— provided sup —;—— < Cq < oo, 

\/n see mm i h 

for a constant C'q that depends only on Cq. From this inequality Carter can 
recover most the same results as Nussbaum [9| under stronger regularity assump¬ 
tions on & is a class of smooth, differentiable densities / on the interval [ 0 , 1 ] 
such that there exist strictly positive constants e,M, 7 such that e < f < M 
and 

I f{x) - f'{y )| < M\x — y| 7 , for all x,y £ [0,1]. 

Let us briefly explain how one can use a bound on the distance between multino¬ 
mial and multivariate normal variables to make assertions about density estima¬ 
tion experiments. The idea is to see the multinomial experiment as the result 
of grouping independent observations from a continuous density into subsets. 
Using the square root as a variance-stabilizing transformation, these multino¬ 
mial variables can be asymptotically approximated by normal variables with 
constant variances. These normal variables, in turn, are approximations to the 
increments of the Brownian motion processes over the sets in the partition. 
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Our work can be seen as a generalization of the previously cited works: To see that 
it is enough to take g{x) = I[o,i](aO and apply Corollary 13.21 However, it differs from 
Nussbaum and Carter’s results in several aspects. First of all, we do not need to ask the 
random variables to be defined on [0,1], allowing the observations to be defined on a 
possibly infinite interval I of R. Secondly, in our setting the positivity restriction on the 
densities can be removed. Indeed, as a parametric example, we can consider truncated 
Gamma distributions on [0, L\, that is distributions having a density h with respect to 
the Lebesgue measure: 

, , , exp(-fc)rx- 1 ¥ , , 

h(x) = —f - Ifo L]{x). 

So exp (-6y)0 n y n - 1 dy 

We can apply Theorem 13.11 taking P = {fg : 9 £ R>o} and 


fe(x) 


f 

Jo 


exp (—9x)9 n 
exp(—9y)9 n y n ~ 1 dy 


g(x) = x n ~ 1 . 


More generally, density functions h that can be written in form of a product are 
commonly used in statistics. Again, one could cite as a simple case the problem of a 
parametric estimation for a Weibull density, see, e.g. ii- Generally speaking, the 
present work can be useful whenever the random variables Yi ’s do not admit a smooth 
density h with respect to Lebesgue, but nevertheless one has some informations on the 
discontinuity structure, namely one knows g in the decomposition h(x) = f(x)g(x). 


4. Proofs 

4.1. Proof of Theorem 13.11 We will proceed in four steps. 
Step 1. By means of Facts IA.2I and IA.31 we get 


ip;- 


2=1 


2=1 


> P L 


< H 


TV 


< p f. 


2=1 


2=1 


p l) s v'" H2 <T f L>- 


Hence, denoting by the statistical model associated with the family of probabilities 

(7) & B n ) < \jn J^VJ(x)-\/f m {x)^ g(x)dx. 

Step 2. Following the same approach as in @], we introduce an auxiliary multinomial 
experiment to get closer to a normal one representing the increments of {Y t )t^i defined 
as in ©• The multinomial experiment is linked with the density estimation model in the 
following way: Let Y t be a set of i.i.d. random variables with density f m g with respect 
to Lebesgue and define the multinomial experiment by grouping their observations into 
subsets. More precisely, let us introduce the random variables: 

n 

z,, =i j t (Yj), * = i,..., to. 
i=i 

Observe that the law of the vector ( Z \,..., Z m ) is multinomial A4 (n; 71 ,..., y m ) where 

7 i=y f{x)g{x)dx, i = 1 ,..., to; 

here we have used the fact that f 7 f{x)g{x)dx = J 7 f m (x)g(x)dx. Let us denote by 
A 4m the statistical model associated with the observation of (Zi,..., Z m ). Clearly 
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S(^%, M m ) = 0. Indeed, A4 m is the image experiment by the random variable S : 
I n —> {1,..., n} m defined as 

S{x i, ■ • ■, x n ) = ■ Xj £ Ji };...; ■ Xj £ J m , 

where ffA denotes the cardinal of the set A. To conclude the second step we now prove 
that the multinomial experiment is as informative as 

Lemma 4.1. 

6(M m ,&’°) = 0. 

Proof. We need to produce an explicit Markov kernel that allows to approximate the 
density f m g given an observation from the multinomial model. For all j = 2,..., m — 1, 
let Uj(x) be the (compactly supported) triangular shaped function, such that 

TTL 

(8) J = 0, Uj(x*) = = —, Uj(x* j+1 ) = 0, 

Z'o ( J j ) Mn 

linearly interpolated between these values. We also define analogously (compactly sup¬ 
ported) trapezoidal shaped functions u±, u m ; the former is supported on [0, , where 

it is the linear interpolation of 

iti(0) = ui(x*) = —-r—- and iti^) = 0. 

^owi) 

u m is defined analogously on [x)) l _ 1 ,l] with u m (x^-i) = 0 and u m(Xm) = u m{x) = 
Vg (~j for all x > x* m . The required (randomized) Markov kernel is then 

K((ki,...,k m ),A) = / u X(kl . km) (x)u 0 (dx), V(fci,..., k m ) € N, ^ h = n, Ac R, 

J A i 

where -X”(fc 1 ,...,fe m ) £ {1 5 • ■ ■, to} is a randomly chosen integer assigning to j the weight 

□ 

n 

Step 3. Let us denote by J\f m the statistical model associated with the observation 
of m independent Gaussian variables Af(y/rvyi, j), i = 1,..., m. Since one 

can apply Theorem IA.7I obtaining 

. , . . ... _ (m In m \ 

= 0 { ^ j' 

Here the O depends only on M and k. 

Step 4. Finally, we conclude the proof of Theorem 13. II by showing that 

(9) A(A^n, #)f) < 2y/n sup (A m (f) + . 

f£& 

As a preliminary remark note that is equivalent to the model that observes a 
trajectory from: 

dy t = VW)9m + '{^ dW t , t £ I. 

2 y/n 

In order to prove © we proceed in the following way: First of all, we prove that jV m 
is equivalent to the model that observes the increments on the intervals Ji of (j/ t ) tg /. 
Secondly, we show that the increments of (y t ) te / are more informative than another 
Gaussian process, say (Y t *) t& i, that turns out to be very close to {yt)tei in the total 
variation distance. We then conclude the asymptotic equivalence between A^ m and Wf* 
observing that the increments of (yt)tei are obviously less informative than Wf*. 

Let us denote by Yj the increments of the process ( y t ) over the intervals Jj, j = 
1,..., m, i.e. 

Yj := Vvj ~ Vvj-i 


■A^( jj VJU)Mdy),^^J 
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and denote by jV m the statistical model associated with the distributions of these incre¬ 
ments. As announced we start by bounding the Le Cam distance between and 
showing that 

(10) A(^ m ,^ / K n ) < 2y/n sup B m (f), for all m. 

In this regard, remark that the experiment is equivalent to another experiment, say 

, that observes m independent Gaussian random variables of means , 2v ^ ^ j j \Jf(y)vo(dy), 
j = 1,..., m and variances identically 1. Hence, using also Propertv lA.il Facts 1701 and 

1701 we get: 


A < A{JKu,^*) < 


Y ( Pro / Vf(y)M d v) - 2 \Jnv{jj) 
7=t \VMJj) Jji v 


\i= 


Using similar ideas as in Section 8.2 of |2| and Lemma 3.2 of Q, we introduce a new 
stochastic process constructed from the random variables Yj’ s. To that end recall the 
notation introduced in the proof of Lemma 14. 11 see (f8|), and define 

(li) y*=J 2 ^j Uj(y)Mdy) + tt-tY 

j=1 Jin[o,t] j=i 

where the ( Bj(t))t are independent centered Gaussian processes with variances 

Var(H J (t))=/ Uj(y)v Q {dy) - ( / Uj(y)v 0 {dy)) . 

Jin[o,t] \d/n[o,t] / 

By construction, (Y*) is a Gaussian process with mean and variance given by, respec¬ 
tively: 

E K1 = Y E K'] / u i(y)Mdy) = Y ( / VTfej^oW / Uj(y)u Q {dy), 

j=i Jin[o,t) j =1 \Jjj J «//n[o,t] 

m / r \ 2 i m 

Var[F t *] = Y Var [Yj] f / Uj{y)v 0 (dy) j + — Y l/ o(^)Var(H i (t)) 

7=1 \ Jin[o,t] J j = i 


\ r \ r 

= T / 'Y u o(Jj)uj{y)i'o(dy) = J- lu 0 (dy) = 

4n Jin[o,t] 4n Jin[o,t] 


u 0 (/n [0 ,t]) 
4 n 


Therefore, 


where 


= [ Vf m (y)Mdy) + t€l, 

2/n[o,i] ^v n 

V / 7m( a; ) : = Y ( / VJW)Mdy]\u 3 (x). 

7=1 s j Jj J 


Applying Fact I A. 5l we get that the total variation distance between the process ( Y t *) t ^i 
constructed from the random variables Yj, j = 1,..., m and the Gaussian process (Y t ) te / 
is bounded by 

\/ 4 ™ [ (V Imiy) - \/J^y)) 2 Mdy), 


as wanted. 
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4.2. Proof of Corollary 13.21 We start by proving a technical Lemma needed for the 
proof of Corollary 13.21 Recall the following notations: p n = (Jj), for all j and 

I m = maxj = i i ... iTO |Vj — Vj- 1 |, with the vfs defined as in Section[5J 

Lemma 4.2. If f £ then 

with the O depending on K , M and k. 


Proof. Let us consider the Taylor expansion of / at points x*, where x denotes a point 
in = 

(12) /( x) = f(x*) + f'(x*)(x - x*) + R(x). 

The smoothness condition on / allows us to bound the error R as follows: 

\R(x)\= f{x)~ f(x*)~ f\x*){x-x*) 

= \ f(tj) - f(x*)\\Zj - x*\ < Kf+f 
where f j is a certain point in (x *_ : , x *]. 

By the linear character of f m , we can write: 

fm(x) = fm(Xj) + f'm( x *j)(x ~ x j) 

where f' m denotes the left or right derivative of f m in x* depending whether x < x* 
or x > x*; this equals to f'{t) for some f 6 Jj, which allows us to exploit the Holder 
condition. Indeed, if x £ Jj, j = 1,..., m, then there exists t £ Jj such that: 

I f(x) ~ fm(x)\ < | f(x*) f m (x*)\ + | f(x*) - f(t)\\t X*\ + 1^)1 

< \f(Xj) - fm(x*)\ + K\t - X*y +1 + Kf+1 < | f(x*) - fm{Xj)\ + 2 I<f+P 
Using (fL?l) and the fact that f (x — x*)vo(dx) = 0, one gets: 

1 


\f{x*j) - fm(x*)\ = 


M J j) 


{f{xj) ~ f(x))is 0 {dx) 




Moreover, observe that, for all x £ Jj, i = 1,..., to, \f(x) — ^jA | > is bounded by 
3 Kff 1 + indeed: 

v (Jj) 


f(x) - 




= I f(x) ~ fm(x*) I < I fix) - fm{x)\ + \fm( X ) ~ fm(x*) \ 
< ZKC 1 + \f'm( x *)(x - x*i)\ < 3 Kf+^ + Ml m . 


Collecting all the pieces together we find 

f (/(*) - f m (x)) 2 Mdx) < 2/z„(3 Kf+-y + Mi m ) 2 + 18 K 2 £ 2 + 2 P 


□ 


Proof of Corollary Id. 21 First of all, let us observe that vfl) is finite; indeed, the posi¬ 
tivity condition on / (f(x) > n > 0) implies that vq (I) < f Also, by means of the fact 
that f(x) > k for all x £ I one can write: 

(g(x)dx = [ ( — A!— AM j g(x)dx < -1 f (f{x)-f m (x)) 2 g(x)dx. 

v ’ !A cm+4uf)> iKj ' 

A straightforward application of Lemma 14.21 gives 


Hi 
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The same bound holds for A^ n (f) since if / £ then v 7 / £ v /^ % /m)‘ 

Moreover, one can see that B m converges with the same rate as A m . This may be done 
by explicit computations, see [s|, Lemma 3.10 for more details. □ 

Appendix A. Background 

A.l. Le Cam theory of statistical experiments. A statistical model or experiment 
is a triplet LPj = (%j,£/j,{Pjp',9 £ 0}) where {Pjg-,6 £ 0} is a family of probability 
distributions all defined on the same er-field srfj over the sample space SPj and 0 is the 
parameter space. The deficiency 5(£? 1 1, ^2) of with respect to 3^2 quantifies “how 
much information we lose” by using 3^i instead of SA 2 and it is defined as S(3*1, SPfi) — 
infx sup eg Q \\KP\j — Pi.oWtv■ where TV stands for “total variation” and the infimum 
is taken over all “transitions” K (see (6|, page 18). The general definition of transition 
is quite involved but, for our purposes, it is enough to know that (possibly randomized) 
Markov kernels are special cases of transitions. By KP\g we mean the image measure 
of Pi } g via the Markov kernel K, that is 

KP 1>e {A)= [ K(x,A)P h g(dx ), VAg^/ 2 . 

J St ,'1 

The experiment KZP 1 = (Sfo, ^ 2 , {KPi,e', 0 £ 0}) is called a randomization of 3?\ by 
the Markov kernel K. When the kernel K is deterministic, that is K(x,A) = I^S^a:) 
for some random variable S : {SP\ , (^2,^2), the experiment K3^i is called 

the image experiment by the random variable S. The Le Cam distance is defined as 
the symetrization of S and it defines a pseudometric. When A(3*i, 3 ^ 2 ) = 0 the two 
statistical models are said to be equivalent. Two sequences of statistical models (^ a ") rl eN 
and (^2 )n£N are called asymptotically equivalent if A(3*™, 3?%) tends to zero as n goes 
to infinity. A very interesting feature of the A-distance is that it can be also translated 
in terms of statistical decision theory. Let 3> be any (measurable) decision space and 
let L : 0 x 3) i-A [0, 00) denote a loss function. Let ||L|| = sup (e,z)e&x.@L{6,z). Let 7r, 
denote a (randomized) decision procedure in the i-th experiment. Denote by Rfiiti, L 1 9) 
the risk from using procedure 7 q when L is the loss function and 9 is the true value of 
the parameter. Then, an equivalent definition of the deficiency is: 

S(&i, 3 * 2 ) = inf sup sup sup \Ri(ni, L, 9) — #2(712, L, 0)1- 
7ri 7T2 l-.\\l\\=i 

Thus A(3 > \, 3 ^ 2 ) < £ means that for every procedure 7q in problem i there is a proce¬ 
dure 7 Tj in problem j, {i , j} = {1, 2}, with risks differing by at most e, uniformly over all 
bounded L and 9 £ 0. In particular, when minimax rates of convergence in a nonpara- 
metric estimation problem are obtained in one experiment, the same rates automatically 
hold in any asymptotically equivalent experiment. There is more: When explicit trans¬ 
formations from one experiment to another are obtained, statistical procedures can be 
carried over from one experiment to the other one. 

There are various techniques to bound the Le Cam distance. We report below only 
the properties that are useful for our purposes. For the proofs see, e.g., 0G3- 

Property A.l. Let 3* 3 = (3P ,zt/,{Pj t g;9 £ 0}), j = 1,2, be two statistical models 
having the same sample space and define Aq(3^i, 3 ^ 2 ) := sup^gQ \\Pip — P 2 ,e\\rv- Then, 
A(&’ 1 ,&> 2 ) < A 0 (^i,^ 2 ). 

In particular, Propertv IA.H allows us to bound the Le Cam distance between statistical 
models sharing the same sample space by means of classical bounds for the total variation 
distance. To that aim, we collect below some useful results. 
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Fact A. 2. Let P\ and P 2 be two probability measures on SF, dominated by a common 
measure £, with densities gt = i = 1,2. Define 

L 1 (P 1 ,P 2 ) = f \gi(x) - g 2 {x)\£(dx), 

J sc 

/ f 2 \ 1/2 

H(P 1 ,P 2 ) = (j (VffiOO - Vg 2 {x)^ f(dx)j . 

Then, 

WP 1 -P 2 WTV = ^i(Pi,P 2 ) < H(P U P 2 ). 

Fact A. 3. Let P and Q be two product measures defined on the same sample space: 

p = ®"=i Pi, Q = ®i=iQi- Then 

n 

H 2 (P,Q) <J2 h2 (p*’Q*)- 
2=1 


Fact A.4. Let Qi ~ erf) and Q 2 ~ erf)- Then 


WQi 


Q2WTV < 



(Mi ~ hi ) 2 
2o\ 


Fact A.5. For i = 1,2, let Qi, i = 1,2, be the law on ( C ,%?) of two Gaussian processes 
of the form 


X\ = f hi(s)ds + f a(s)dW a , t 
Jo Jo 


£ I 


where hi G L 2 (WL) and a £ M>o- Then: 




ds. 


Property A.6 . Let SP, = (■%], s^i,{Pi } g,9 £ 0}), i = 1,2, be two statistical models. 
Let S : 3F\ —\ SP 2 be a sufficient statistics such that the distribution of S under P\g is 
equal to P 2 p. Then A(^i, £P 2 ) = 0. 


Finally, we recall the following result that allows us to bound the Le Cam distance 
between multinomial and Gaussian variables. According with the notation used through¬ 
out the paper, „#(n, 6) stands for a multinomial distribution of parameters (n, 6). 

Theorem A.7. (See H/, Theorem 1 and Sections 7.1, 7.2) Let £P = {Pg : 9 £ Or}, 
where Pg = ^f(n,6) and Or C R m consists of all vectors of probabilities such that 

max 9i „ 

—IT - K 

mm 9i 


Let J2 = {Qg : 6 £ 0r} where Qg is the multivariate normal distribution with vector 
mean (y/n&i ,..., y/n9 m ) and diagonal covariance matrix \l m ■ Then 


A (&,£)< C R 


to In to 



for a constant Cr that depends only on R. 
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